All prerequisites, links to material and slides for this course can be found on github.
Or can be downloaded as a zip archive from here.
Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.
Something works on your computer (e.g. bioinformatics analysis or software deployment), and you want to make sure that it will work on another computer.
https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html - CC-BY 4.0
Docker allows for the creation of an isolated environment that can be shipped across different users, machines, or operating systems, and to virtual machines or the cloud.
https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html - CC-BY 4.0
The Docker client communicates with the Docker daemon based on user commands
A daemon is a program that runs as a background process and is not under direct control of the computer user, and the Docker daemon is the engine that manages Docker services and objects
The ‘docker build’ command uses a Dockerfile to create an image
A Docker image is a read-only, isolated file system that contains all software, dependencies, scripts, and metadata required to run a container.
Once an image is built, an instance of this image can be launched as a stand-alone application, also known as a container
There are public repositories of Docker images (e.g. Docker Hub), and typically you start with an existing image and build on top of this.
Use this link to install Docker.
Check Docker version to make sure Docker is installed and running
Code (terminal):
docker --versionOutput:
If previous command isn’t found check the Docker Desktop advanced settings and make sure CLI tools are available system-wide
Rocker is a very useful source of images on Docker Hub for R and RStudio. We can pull these images immediately after installing Docker. Here we pull an image containing RStudio and a specific version of R
Code (terminal):
docker pull rocker/rstudio:4.2.3After pulling, the image is now available on our system to run.
Images have names, tags, and image IDs as shown in the output. The ID is a hash of the metadata and filesystem of the Docker image.
Code (terminal):
docker imagesOutput:
After pulling, the image is now available on our system to run.
Images have names, tags, and image IDs as shown in the output. The ID is a hash of the metadata and filesystem of the Docker image.
Code (terminal):
docker imagesOutput:
Confirm in Docker desktop:
Once the image is on our system, we can launch a container with the ‘docker run’ command.
Components of the run command: * –rm: this will automatically remove a container when you exit, otherwise can take up room on computer with old, unused containers * -p: before the colon is the port on your computer to be exposed and after the colon is the port inside the container * -e: an environmental variable is set when the container is run, and this will be the password to login * the last argument is the image name and tag (both seen with ‘docker images’)
Code (terminal):
docker run --rm \
-p 8787:8787 \
-e PASSWORD=password \
rocker/rstudio:4.2.3While the container is running, we can go to ‘http://localhost:8787’ in a browser and log in with the the user ‘rstudio’ and the password from ‘docker run’.
To see all containers running in the local environment, use the ‘docker ps’ command
Code (terminal):
docker psOutput:
To stop the container currently running, if you are in the terminal tab where it was launched press Ctrl+C.
Or open up another tab and the ‘docker stop’ command can be used with the ID listed from ‘docker ps’
Code (terminal):
docker stop 6ee1e0e97bf8 # this is the ID from 'docker ps'
docker psOutput:
The docker container has it’s own file system, and we can mount a local directory onto that file system with the ‘-v’ flag for the ‘docker run’ command.
Code (terminal):
# navigate to 'r_course' directory in downloaded material
cd /PathToDownloadedCourse/Reproducible_R-master/r_course
# launch docker container
docker run --rm \
-v ./data:/home/rstudio \
-p 8787:8787 \
-e PASSWORD=password \
rocker/rstudio:4.2.3The RStudio interface now shows the files in the ‘data’ directory
These files can be read into R, and also files can be written to the local environment
Output:
These files can be read into R, and also files can be written to the local environment
Code (R in docker image):
dataIn <- read.csv("readThisTable.csv")
head(dataIn, 2)
# add gene IDs and write to new file on local computer
dataIn$Gene_ID <- seq(nrow(dataIn))
write.csv(dataIn, "rnaseq_table_withIDs.csv")The R environment files from this RStudio session are written to the working directory in the container, and therefore are copied to the local directory as hidden folders.
This R environment will then be loaded the next time you launch an RStudio container with this volume mounted. If these folders are removed (.config and .local), then a fresh RStudio session will be launched.
Code (terminal):
ls -a dataOutput:
The image we pull from Rocker contains base R and its associated packages. To customize the image, we will need to make a Dockerfile that adds to the Rocker image.
A Dockerfile provides the recipe to make the image. Using specialized commands, this file provides instructions to install the R packages and its dependencies.
Some examples: * FROM: sets the base image and further instructions build off of this * RUN: executes a command as if in terminal * LABEL: add metadata to the image * COPY: copies files from the the host system to the image file system * CMD: when the container is launched, this is the command that will be run
Here we start with the same RStudio base image we used previously, and then add some key R packages.
The first RUN command installs system dependencies that are common to R packages. This command looks for updates, installs, and cleans up unnecessary files. Adding more R packages could result in missing dependencies, which you can pick up in the log for the build command. Dependencies for CRAN packages can also be found here.
Then the R packages are installed using ‘install.packages’ or ‘BiocManager::install’ for Bioconductor packages.
Note: The ‘options(warn=2)’ at the beginning of the R command will stop the installation when there is a warning, making it easier to debug.
The port 8787 is exposed and the ‘init’ script that is included with the base RStudio image
Code (terminal):
docker build -t rocker/rstudio:4.2.3_v2 ./dataOutput:
Use the docker ‘images’ command to see image
Code (terminal):
docker imagesOutput:
As done previously, use the ‘docker run’ command to launch a container with our customized RStudio session
Code (terminal):
docker run --rm \
-v ./data:/home/rstudio \
-p 8787:8787 \
-e PASSWORD=password \
rocker/rstudio:4.2.3_v2As done previously, use the ‘docker run’ command to launch a container with our customized RStudio session
Output:
The directory that contains the Dockerfile is the last argument
This Dockerfile is not named ‘Dockerfile’, so we specify the exact path with ‘-f’ argument
Code (terminal):
docker build -t rocker/rstudio:4.2.3_salmon -f ./data/Dockerfile_salmon ./data/Output:
Code (terminal):
docker imagesOutput:
Code (terminal):
docker run --rm \
-v ./data:/home/rstudio \
-p 8787:8787 \
-e PASSWORD=password \
rocker/rstudio:4.2.3_salmonCode (R in docker image):
library(Herper)
# the environment name and miniconda path set in the Dockerfile
Herper::local_CondaEnv(new = "pipe_env",
pathToMiniConda = "/home/miniconda")
# test out salmon
system("salmon -h")Output:
If we then want to share our images with someone else, or simply store them elsewhere for future use, we can push to Docker Hub.
Make sure you have an account on Docker Hub.
Code (terminal):
# log in and provide credentials used to sign into Docker Hub
# this will prompt you to enter username and password
docker login
# tag the image you want to push with your Docker Hub username and a tag name after the colon
# the ID is from the 'docker images' command
docker tag 98579f07a026 dougbarrows/rstudio_4.2.3_salmon:topush
# push to Docker Hub
docker push dougbarrows/rstudio_4.2.3_salmon:topushIf we then want to share our images with someone else, or simply store them elsewhere for future use, we can push to Docker Hub.
Make sure you have an account on Docker Hub.
renv and Docker can be used in tandem to easily recreate and R environment.
Code (R on local computer):
setwd("/PathToMyDownload/Reproducible_R-master/r_course/Data/renv_docker")
# load in packages to recreate environment we used previously
library(renv)
library(BiocManager)
library(Herper)
library(dplyr)
library(DESeq2)
library(tximport)
# initialize renv
renv::init()The lock file generated by renv shows the versions of R and the loaded packages on my local computer.
At the time, I was using older versions of R and Bioconductor, and specific versions of each package. With Docker, we can easily use this lock file to recreate that exact same R environment.
R still needs to be installed to use renv, so we use Rocker again to install a specific version of R to match the renv lock file.
The lock file is in the same directory as the Dockerfile, and when building the image the lock file is copied to the image into a directory that is created and set as the working directory with the WORKDIR command.
Build the image with the build context (last argument) set to the directory containing the Dockerfile and the lock file, then launch a container.
# build the image
docker build -t rocker/rstudio:4.1.1_renv ./data/renv_docker
# launch a container
docker run --rm \
-v ./data:/home/rstudio \
-p 8787:8787 \
-e PASSWORD=password \
rrocker/rstudio:4.1.1_renvExercise on Reproducibility in R can be found here
Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.